Real-Time Prototype for Integration of Blind Source Extraction and Robust Automatic Speech Recognition

نویسندگان

  • Francesco Nesta
  • Marco Matassoni
  • Hari Krishna Maganti
چکیده

This demo presents a real-time prototype for automatic blind source extraction and speech recognition in presence of multiple interfering noise sources. Binaural recorded mixtures are processed by a combined Blind/Semi-Blind Source Separation algorithm in order to obtain an estimation of the target signal. The recovered target signal is segmented and used as input to a real-time automatic speech recognition (ASR) system. Further, to improve the recognition performance, noise robust features based on Gammatone filters are used. The demo utilizes the data provided for the CHiME Pascal speech separation and recognition challenge and also real-time mixtures recorded onsite. Users will be able to listen to the recovered target signal and compare it with the original mixture and ASR output.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Two-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments

An acoustic front-end for robust automatic speech recognition in noisy and reverberant environments is proposed in this contribution. It comprises a blind source separation-based signal extraction scheme and only requires two microphone signals. The proposed front-end and its integration into the recognition system is analyzed and evaluated in noisy living room-like environments according to th...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

A Flexible Spatial Blind Source Extraction Framework for Robust Speech Recognition in Noisy Environments

Blind source extraction (BSE) is an attractive approach to enhance multichannel noisy speech data, as a preprocessing step for an automatic speech recognition system. BSE was successfully applied to the first Chime Pascal Challenge for improving the recognition rate of noisy commands in a small dictionary task. In this work we reviewed the BSE architecture and improved each system block in the ...

متن کامل

Nonverbal Communication in Spontaneous Speech Recognition

Verbal communication is the most obvious instrument used to express our thoughts and ideas, considering only this part of speech without regarding its nonverbal part, may lead to overlooking important information of utterance or even misunderstanding it. The development of an automatic system for recognition of facial expressions is a rather difficult task. Such a system must perform automatica...

متن کامل

Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment

We present noise robust automatic speech recognition (ASR) using sparseness-based underdetermined blind source separation (BSS) technique. As a representative underdetermined BSS method, we utilized time-frequency masking in this paper. Although time-frequency masking is able to separate target speech from interferences effectively, one should consider two problems. One is that masking does not...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011